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DETAILED ACTION 

Response to Amendment 

1 . This Office Action is in response to applicant's communication filed 26 January 
2009, in response to the Office Action mailed 28 July 2008. The applicant's remarks 
and amendments to the claims and specification were considered, with the results that 
follow. 

2. In view of the appeal brief filed on 26 January 2009, PROSECUTION IS 
HEREBY REOPENED. New grounds of rejection are set forth below. 

To avoid abandonment of the application, appellant must exercise one of the 
following two options: 

(1 ) file a reply under 37 CFR 1.111 (if this Office action is non-final) or a reply 
under 37 CFR 1.113 (if this Office action is final); or, 

(2) initiate a new appeal by filing a notice of appeal under 37 CFR 41 .31 followed 
by an appeal brief under 37 CFR 41 .37. The previously paid notice of appeal fee and 
appeal brief fee can be applied to the new appeal. If, however, the appeal fees set forth 
in 37 CFR 41 .20 have been increased since they were previously paid, then appellant 
must pay the difference between the increased fees and the amount previously paid. 

A Supervisory Patent Examiner (SPE) has approved of reopening prosecution by 
signing below: 

3. Claims 1-7 remain pending in this application. 
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Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claims 1 , 3 and 4 are rejected under 35 U.S.C. 1 03(a) as being unpatentable 
over Dally (US 6,192,384) in view of Garde (US 6,510,510). 

As per claim 1 , Dally teaches a multi-issue processor comprising a register file as 
[a parallel processing computer system (column 1, lines 9-13) including a stream 
register file 14 (figure 1)]; and a plurality of issue slots as [ALU clusters 0-7 (numeral 
18, figure 1)], each one of the plurality of issue slots including a plurality of functional 
units as [each ALU cluster 18 includes a number of ALUs 26 (figures 2-3)], an input 
routing network that provides multiple data path outputs for a single data path input as 
[crosspoint switch 30 distributes the inputs to the ALUs 26, from a single input 
from the stream register file (SRF) (figures 2-3)], the input routing network receiving 
data from the register file on the single data path input via a single data input path and 
providing data from the register file to functional units of the plurality of functional units, 
the data provided on the multiple data path outputs via multiple data output paths as 
[crosspoint switch 30 outputs the operands to the ALUs 26, from a single input 
from the stream register file (SRF) 14 (figures 1-3 and column 4, lines 38-58)], and 
a plurality of holdable registers that hold duplicate data from the register file, wherein in 
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a first set of the plurality of issue slots the holdable registers store data on the multiple 
data output paths of the first set as [local register files 28 buffer the inputs to the 
ALUs 26 and store local constants, parameters and variables for the cluster, 
where the local register files 28 are fed by the crosspoint switch 30 (figures 1-3 
and column 4, lines 38-58)]. 

Dally does not explicitly teach a second set of the plurality of issue slots the 
holdable registers store data on the single data input path corresponding to the input 
routing networks of the second set. 

Garde teaches a second set of the plurality of issue slots the holdable registers 
store data on the single data input path corresponding to the input routing networks of 
the second set as [operand latch 132 holds the output of the registers 130 on the 
input to the routing network of op busses 110 and 112 to the computation units 
(column 6, lines 3-14 and figure 2)]. 

Dally and Garde are analogous art, as they are within the same field of endeavor, 
namely connecting a register file to multiple functional/computation units. 

It would have been obvious to one of ordinary skill in the art, at the time the 
invention was made, to use the operand latches for the register file output of Garde on 
the outputs of the stream register file/input to the crosspoint switches for some of the 
clusters taught by Dally. 

Because both Dally and Garde teach systems with a register file output 
connected to a series of inputs of a number of functional/computational units, and both 
including latches/registers on the inputs of the individual functional/computational units, 
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it would have been obvious to one of ordinary skill in the art to use the operand latches 
for the register file output of Garde on the outputs of the stream register file/input to the 
crosspoint switches for some of the clusters taught by Dally, to achieve the predictable 
result of latching the output of the stream register file before it is sent via the crosspoint 
switches to the various ALUs. This extra level of registers between the register file and 
computational units also can provide an additional pipeline stage, allowing higher clock 
frequency, if desired, as described in the applicant's admitted prior art on page 2, lines 
8-25, of the specification. 

As per claim 3, Dally teaches wherein the input routing network of each of the 
plurality of issue slot has a plurality of data path inputs as [the stream register file 
sends data to the ALU clusters via the crosspoint switches (figures 1-3) and 
crosspoint switch 30 outputs the operands to the ALUs 26, from a single input 
from the stream register file (SRF) 14 as well as an input from outputs of the other 
ALUs in the cluster (figures 1-3 and column 4, lines 38-58)] and 

Dally does not explicitly teach that in the second set of issue slots holdable 
registers of the plurality of holdable registers are located between each of the inputs of 
the input routing network and the register file. 

Garde teaches the second set of issue slots' holdable registers of the plurality of 
holdable registers are located between each of the inputs of the input routing network 
and the register file as [operand latch 132 holds the output of the registers 130 on 
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the input to the routing network of op busses 110 and 112 to the computation 
units (column 6, lines 3-14 and figure 2)]. 

Dally and Garde are analogous art, as they are within the same field of endeavor, 
namely connecting a register file to multiple functional/computation units. 

It would have been obvious to one of ordinary skill in the art, at the time the 
invention was made, to use the operand latches for the register file output of Garde on 
the outputs of the stream register file/input to the crosspoint switches for some of the 
clusters taught by Dally. 

Because both Dally and Garde teach systems with a register file output 
connected to a series of inputs of a number of functional/computational units, and both 
including latches/registers on the inputs of the individual functional/computational units, 
it would have been obvious to one of ordinary skill in the art to use the operand latches 
for the register file output of Garde on the outputs of the stream register file/input to the 
crosspoint switches for some of the clusters taught by Dally, to achieve the predictable 
result of latching the output of the stream register file before it is sent via the crosspoint 
switches to the various ALUs. This extra level of registers between the register file and 
computational units also can provide an additional pipeline stage, allowing higher clock 
frequency, if desired, as described in the applicant's admitted prior art on page 2, lines 
8-25, of the specification. 

As per claim 4, Dally teaches wherein, in the first set of issue slots, holdable 
registers are located between the input routing networks and each of the plurality of 
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function units as [local register files 28 buffer the inputs to the ALUs 26 and store 
local constants, parameters and variables for the cluster, where the local register 
files 28 are fed by the crosspoint switch 30 (figures 1-3 and column 4, lines 38- 
58)]. 

6. Claims 2 and 5-7 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Dally (US 6,1 92,384) in view of Garde (US 6,51 0,51 0), and further in view of Fisher 
(US 6,026,479). 

As per claim 2, Dally teaches the multi-issue processor of claim 1 , as described 

above. 

Dally does not teach a first instruction set accessing at least the first set of issue 
slots; and a second instruction set accessing the second set of issue slots, however. 

Fisher teaches "a first instruction set accessing at least the first set of issue slots; 
and a second instruction set accessing the second set of issue slots" as ["A CPU 
having a cluster VLIW architecture. ..which operates in both a high instruction 
level parallelism (ILP) mode and a low ILP mode. In high ILP mode, the CPU 
executes wide instruction words using all operational clusters of the CPU and all 
of a main instruction cache and main data cache of the CPU are accessible to a 
high ILP task. The CPU also includes a mini-instruction cache, a mini-instruction 
register and a mini-data cache which are inactive during high ILP mode. An 
instruction level controller in the CPU receives a low ILP signal, such as an 
interrupt or function call to a low ILP routine, and switches to low ILP mode. In 
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low ILP mode, the main instruction cache and main data cache are deactivated to 
preserve their contents. At the same time, a predetermined cluster remains active 
while the remaining clusters are also deactivated. The low ILP task executes 
instructions from the mini-instruction cache which are input to the predetermined 
cluster through the mini-instruction register. The mini-data cache stores 
operands for the low ILP task"(abstract, lines 1-19)]. 

Dally and Fisher are analogous art, as they are within the same field of endeavor, 
namely instruction processing. 

At the time of the invention it would have been obvious to a person of ordinary 
skill in the art to combine the parallel processor with a register file connected to a 
number of ALU clusters via crosspoint switches controlling the inputs of the functional 
units of each ALU in the clusters, taught by Dally, with the multiple instruction sets and 
multiple groupings of resources for each, as taught Fisher. 

The motivation for doing so is provided by Fisher as ["the separate mini- 
instruction cache and mini-data cache along with the use of only the 
predetermined cluster minimizes the pollution of the main instruction and data 
caches, as well as pollution of register files in the deactivated clusters, with 
regard to a task executing in high ILP mode"(abstract, lines 20-24)]. 

As per claim 5, Dally teaches the multi-issue processor of claim 1 , as described 

above. 



Application/Control Number: 1 0/51 1 ,51 2 Page 9 

Art Unit: 2183 

Dally does not teach wherein the first set of issue slots are accessed by a first set 
of instructions for a VLIW processor and the second set of issue slots are accessed by 
a second set of instructions that are used by an interrupt routine, however. 

Fisher teaches wherein the first set of issue slots are accessed by a first set of 
instructions for a VLIW processor and the second set of issue slots are accessed by a 
second set of instructions that are used by an interrupt routine as ["A CPU having a 
cluster VLIW architecture. ..which operates in both a high instruction level 
parallelism (ILP) mode and a low ILP mode. In high ILP mode, the CPU executes 
wide instruction words using all operational clusters of the CPU and all of a main 
instruction cache and main data cache of the CPU are accessible to a high ILP 
task. The CPU also includes a mini-instruction cache, a mini-instruction register 
and a mini-data cache which are inactive during high ILP mode. An instruction 
level controller in the CPU receives a low ILP signal, such as an interrupt or 
function call to a low ILP routine, and switches to low ILP mode. In low ILP mode, 
the main instruction cache and main data cache are deactivated to preserve their 
contents. At the same time, a predetermined cluster remains active while the 
remaining clusters are also deactivated. The low ILP task executes instructions 
from the mini-instruction cache which are input to the predetermined cluster 
through the mini-instruction register. The mini-data cache stores operands for the 
low ILP task" (abstract, lines 1-19)]. 

Dally and Fisher are analogous art, as they are within the same field of endeavor, 
namely instruction processing. 
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At the time of the invention it would have been obvious to a person of ordinary 
skill in the art to combine the parallel processor with a register file connected to a 
number of ALU clusters via crosspoint switches controlling the inputs of the functional 
units of each ALU in the clusters, taught by Dally, with the multiple instruction sets and 
multiple groupings of resources for each, as taught Fisher. 

The motivation for doing so is provided by Fisher as ["the separate mini- 
instruction cache and mini-data cache along with the use of only the 
predetermined cluster minimizes the pollution of the main instruction and data 
caches, as well as pollution of register files in the deactivated clusters, with 
regard to a task executing in high ILP mode"(abstract, lines 20-24)]. 

As per claim 6, Fisher teaches wherein the second set of instructions has fewer 
instructions than the first set of instructions as [An embodiment of a method for 
reducing cache pollution in a CPU, according to the present invention, includes 
providing a main instruction cache configured to store VLIW instructions, 
wherein each VLIW instruction is further comprised of a plurality of c- 
instructions, providing a plurality of operational clusters, wherein each one of the 
plurality of operational clusters is configured to receive one of the plurality of c- 
instructions of each VLIW instruction in the main instruction cache, and 
executing a high ILP task by loading VLIW instructions from the main instruction 
cache into a main instruction register for output to the plurality of clusters. The 
method includes receiving a low ILP signal and, responsive thereto, deactivating 
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the main instruction cache and main instruction register, deactivating the 
plurality of operational clusters, except for a predetermined one of the 
operational clusters, activating a mini-instruction cache and a mini-instruction 
register, and serially executing a low ILP task by serially loading c-instructions 
from the mini-instruction cache into the mini-instruction cache for output to the 
predetermined one of the operational clusters (column 4, lines 16-34)]. 

As per claim 7, Dally teaches the multi-issue processor of claim 1 , as described 

above. 

Dally does not explicitly teach wherein the first set of issue slots has more issue 
slots than the second set of issue slots, however. 

Fisher teaches wherein the first set of issue slots has more issue slots than the 
second set of issue slots as ["The CPU also includes a mini-instruction cache, a 
mini-instruction register and a mini-data cache which are inactive during high ILP 
mode. An instruction level controller in the CPU receives a low ILP signal, such 
as an interrupt or function call to a low ILP routine, and switches to low ILP mode. 
In low ILP mode, the main instruction cache and main data cache are deactivated 
to preserve their contents. At the same time, a predetermined cluster remains 
active while the remaining clusters are also deactivated. The low ILP task 
executes instructions from the mini-instruction cache which are input to the 
predetermined cluster through the mini-instruction register. The mini-data cache 
stores operands for the low ILP task"(abstract, lines 6-19) wherein deactivating 
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some clusters means the cluster running the low ILP tasks (the second set of 
issue slots) is smaller than the first]. 

Dally and Fisher are analogous art, as they are within the same field of endeavor, 
namely instruction processing. 

At the time of the invention it would have been obvious to a person of ordinary 
skill in the art to combine the parallel processor with a register file connected to a 
number of ALU clusters via crosspoint switches controlling the inputs of the functional 
units of each ALU in the clusters, taught by Dally, with the multiple instruction sets and 
multiple groupings of resources for each, as taught Fisher. 

The motivation for doing so is provided by Fisher as ["the separate mini- 
instruction cache and mini-data cache along with the use of only the 
predetermined cluster minimizes the pollution of the main instruction and data 
caches, as well as pollution of register files in the deactivated clusters, with 
regard to a task executing in high ILP mode"(abstract, lines 20-24)]. 

Response to Arguments 

7. Applicant's arguments with respect to Slavenburg and Martonosi have been 
considered but are moot in view of the new ground(s) of rejection. 

Dally (US 6,192,384) teaches a parallel processor with a stream register file 
connected, via crosspoint switches, to a number of ALU clusters, each of which includes 
a number of ALUs and a number of local registers at the inputs of those ALUs holding 
data from the stream register file. Garde (US 6,510,510) teaches an operand latch on 
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the output of the register files, holding the operand from the register file before it is sent 
over the connection to the computational units. 

Conclusion 

8. The following is a summary of the treatment and status of all claims in the 
application as recommended by M.P.E.P. 707.07(i): claims 1-7 are rejected. 

9. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

a. Balmer (US 2002/01 08026) -- discloses holdable registers on register file 
outputs for transferring data between datapaths. 

b. Hao (4,594,655) -- discloses staging registers on the outputs of the 
register file for holding operands to be sent to the ALUs. 

10. The examiner requests, in response to this Office action, support be shown for 
language added to any original claims on amendment and any new claims. That is, 
indicate support for newly added claim language by specifically pointing to page(s) and 
line number(s) in the specification and/or drawing figure(s). This will assist the examiner 
in prosecuting the application. 

1 1 . When responding to this office action, Applicant is advised to clearly point out the 
patentable novelty which he or she thinks the claims present, in view of the state of the 
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art disclosed by the references cited or the objections made. He or she must also show 
how the amendments avoid such references or objections. See 37 CFR 1.111 (c). 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to GEORGE D. GIROUX whose telephone number is 
(571)272-9769. The examiner can normally be reached on Monday through Friday, 
9:30am - 6:00pm E.S.T. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Eddie P. Chan can be reached on 571-272-4162. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Eddie P Chan/ /George D Giroux/ 

Supervisory Patent Examiner, Art Unit 2183 Examiner, Art Unit 2183 



